On Propagated Scoring for Semi-supervised Additive Models

نویسنده

  • Mark Culp
چکیده

In this paper, a semi-supervised modeling framework that combines feature-based (x) data and graph-based (G) data for classification/regression of the response Y is presented. In this semi-supervised setting, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). The Propagated Scoring algorithm proposed for fitting this model is a semi-supervised fixed point regularization approach that essentially extends the generalized additive model into the semisupervised setting. In this paper, we first articulate when semi-supervised degeneracies are expected within our framework and then provide a general regularization strategy to address such circumstances. For statistical analysis we establish that the approach uses shrinking smoothers, provide circumstances for when the result is consistent, provide measures of inference and description, and establish clear connections to supervised models. Lastly, several semi-supervised approaches have been considered for the classification problem posed, typically motivated from energy optimization perspective. In this work, we rigorously connect the statistically based propagated ∗Correspondence Information: M. Culp, Department of Statistics, West Virginia University. E-mail: [email protected]. The author thanks the editor, AE and two referees whose suggestions lead to a substantial improvement of this work. 1 scoring framework to this class of approaches. This is particularly insightful, especially with regard to supervised comparisons, since this type of analysis is lacking for the previous work. Two applications are presented, the first involves classification of protein location on a cell using a network of protein interaction data, and the second involves classification of text documents with citation network information and text data. Some key words: Semi-supervised learning; Fixed Point Optimization; Additive Models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reducing Annotation Efforts in Supervised Short Answer Scoring

Automated short answer scoring is increasingly used to give students timely feedback about their learning progress. Building scoring models comes with high costs, as stateof-the-art methods using supervised learning require large amounts of hand-annotated data. We analyze the potential of recently proposed methods for semi-supervised learning based on clustering. We find that all examined metho...

متن کامل

An Iterative Algorithm for Extending Learners to a Semi-supervised Setting

In this paper, we present an iterative self-training algorithm, whose objective is to extend learners from a supervised setting into a semi-supervised setting. The algorithm is based on using the predicted values for observations where the response is missing (unlabeled data) and then incorporates the predictions appropriately at subsequent stages. Convergence properties of the algorithm are in...

متن کامل

Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. Similarly to multi-view learning, the “segmentation agreements” between the two different types of view are used to overcome the scarcity of the label information on unlabeled data. The proposed approach trains a character-based and word-based model on labele...

متن کامل

Transfer Learning in a Transductive Setting

Category models for objects or activities typically rely on supervised learning requiring sufficiently large training sets. Transferring knowledge from known categories to novel classes with no or only a few labels is far less researched even though it is a common scenario. In this work, we extend transfer learning with semi-supervised learning to exploit unlabeled instances of (novel) categori...

متن کامل

Prototype-Driven Learning for Sequence Models

We investigate prototype-driven learning for primarily unsupervised sequence modeling. Prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label. This sparse prototype information is then propagated across a corpus using distributional similarity features in a log-linear generative model. On part-of-speech induction in English and Chinese,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011